PNG (Portable Network Graphics) Specification, Eighth Draft

By:

Thomas Boutell, boutell@netcom.com
Mark Adler, madler@cco.caltech.edu
Lee Daniel Crocker, lcrocker@netcom.com
Tom Lane, tgl@sss.pgh.pa.us

Additional names will be added shortly.

Permission granted to reproduce this specification in complete and unaltered form. Excerpts may be printed with the following notice: "excerpted from the PNG (Portable Network Graphics) specification, eighth draft." No notice is required in software that follows this specification; notice is only required when reproducing or excerpting from the specification itself.

The authors wish to acknowledge the contributions of the Portable Network Graphics mailing list and the readers of comp.graphics.

This is the eighth draft of the PNG (formerly "PBF") specification discussion document, replacing all previous drafts. There are several significant changes from the previous drafts.

Changes in the Eighth Draft

Chunk naming now covers public/private, critical/ancillary, and copy/don't copy; naming convention frozen
CRC *per chunk*; no tail CRC needed (TAIL chunk now marks end of stream)
hIST: Ancillary histogram chunk added
tRNS: "cheap" alpha channel for palette-based images
HEAD: better scheme for color types; grayscale with alpha channel added
Filters now work byte-wise regardless of bit depth
Thumbnails removed (see rationale)
Finalization Schedule
gAMA: completely rewritten

1. Rationale

The PNG format is intended to provide a portable, legally unencumbered, simple, lossless, streaming-capable, well-compressed, well-specified standard for bitmapped image files which gives new features to the end user at minimal cost to the developer.

It has been asked why the PNG format is not simply an extension of the GIF format. The short answer is that the GIF format is embroiled in legal disputes, does not support 24-bit images and lacks the option of an alpha channel.

It has been asked why the PNG format is not TIFF, or a subset of TIFF. The answer is that TIFF does not support a compression scheme that is not legally encumbered, and that a subset of TIFF would simply frustrate users making the reasonable assumption that a file saved as TIFF from Software XYZ will load into a program supporting our flavor of TIFF. Implementing full TIFF would violate the simplicity constraint.

It has been asked why the PNG format is not IFF, or a sub- or superset of IFF. The same concern applies as with TIFF: users with software that purports to generate IFF files will not be pleased when those files do not load in programs supporting the new specification. In addition, the IFF specification has rarely been accurately implemented and there is considerable disagreement among implementations. The IFF file structure could be used, but was not designed with streaming applications in mind; there are workarounds for this, but they are not widely implemented.

It has been asked why PNG does not include lossy compression. The answer is that JPEG already does an excellent job of lossy compression, and there is no reason to repeat that effort. Different tools, different jobs.

It has been asked why PNG uses network byte order. We have selected one byte ordering and used it consistently. Which order in particular is of little relevance, but network byte order has the advantage that routines to convert to and from it are already available on any platform that supports TCP/IP networking, including all PC platforms. The functions are trivial and will be included in the reference implementation. (Note that, in any case, the difficulty of implementing compression is considerably greater than that of handling byte order.)

It has been asked why PNG does not directly support multiple images. A metaformat will be created which permits multiple images and uses PNG image streams internally, with certain minimal alterations, such as the optional omission of palettes. In such a metaformat, the identifying bytes at the beginning will NOT be the same as for PNG.

A related question: it has been asked why PNG does not specify a thumbnail view chunk. In discussions with actual software vendors who use thumbnails in their products, it has become clear that most would not use a "standard" thumbnail chunk. This is partly because every vendor has a distinct idea of what the dimensions and characteristics of a thumbnail should be, and partly because vendors who keep thumbnails in separate files now to accommodate many image formats are not going to stop doing that simply because of a thumbnail chunk in one new format. Vendors may certainly create proprietary thumbnail chunks in accordance with the specification of private chunks below.

PNG has been expressly designed not to be completely dependent on a single compression technique. Although inflate/deflate compression is mentioned in this document, PNG would still exist without it.

PNG supports a full alpha channel as well as a "cheap" alpha channel as an adjunct to the palette. This allows both highly flexible transparency and compression efficiency.

3. Data Representation Note

Integer Values and Byte Order

All integers which are not 1 byte integers will be in network byte order, which is to say the most significant byte comes first, and the less significant bytes in descending order of significance (MSB LSB for two-byte integers, B3 B2 B1 B0 for 4-byte integers). References to bit 7 refer to the highest bit (128) of a byte; references to bit 0 refer to the lowest bit (1) of a byte. Values are unsigned unless otherwise noted. Values explicitly noted as signed are represented in two's complement notation.

Color Values

All color values range from zero (black) to most intense at the maximum value for the bit depth. The "gAMA" chunk specifies the gamma response of the source device, and viewers are strongly encouraged to properly compensate.

Note that the maximum value at a given bit depth is not 2^bitdepth, but rather 2^(bitdepth-1). When scaling values with a bit depth that cannot be directly represented in PNG (4 bit truecolor, for instance), an excellent approximation to the correct value can be achieved by shifting the valid bits to begin in bit 7 (the most significant bit) and repeating the most significant bits into the open bits.

For example, if 5 bits per channel are available in the source data, an acceptable conversion to a bitdepth of 8 can be achieved as follows:

If the red value for a pixel in the source data is 27 (in a range from 0-31), then the original bits are:

4 3 2 1 0
---------
1 1 0 1 1

Converted to a bitdepth of 8:

7 6 5 4 3  2 1 0
----------------
1 1 0 1 1  1 1 0
|=======|  |===|
    |      Leftmost Bits Repeated to Fill Open Bits
    |
Original Bits

Pixel dimensions

Non-square pixels can be represented, but viewers are not required to account for them; see the "pHYS" chunk.

4. The Format

The Identification Header

The first eight bytes always contain the following values:

137 80 78 71 13 10 26 10

The first two bytes distinguish the file on systems that expect the first two bytes to identify the file uniquely. Bytes two through four (overlap with the first two intentional) name the format. The CR-LF sequence catches bad file transfers that alter these characters. The control-Z character stops file display under MSDOS. The final line feed checks for the inverse of the CR-LF translation problem.

The PNG Image Stream

The remainder of the file consists of a PNG image stream, which consists of a series of chunks as described below. The first chunk of the image stream is always a HEAD chunk; the last chunk is always a TAIL chunk.

Chunk Structure

Each chunk consists of:

Length Field: A 4-byte, unsigned integer indicating the size of the chunk data field (see below), which is not to exceed (2^31)-1 bytes. The length does not include the four-byte chunk type immediately following, and does not include the four-byte CRC field. Note that this design allows for a chunk to be skipped even if the implementation does not recognize that particular chunk type. Zero is a valid length.
Chunk Type Field: A 4-byte chunk type consisting of uppercase and lowercase ASCII letters. Case is significant; see "Chunk Naming", below.
Chunk Data: The data bytes appropriate to that chunk, if any. This field may be of zero length.
CRC: A 4-byte CRC (Cyclical Redundancy Check) calculated on the preceding bytes in that chunk, including the chunk name and chunk data fields, but not including the length field. The CRC for each chunk is required in order to detect badly- transferred images as quickly as possible. The length is excluded in order to permit CRC calculation while data is generated (and before the length is known), avoiding an extra pass over the data. See section 5 (details of specific algorithms) for more information on CRC calculation.

Chunk Naming

Note that the same chunk type can appear more than once if necessary, but only as specified in the description of the chunk. This is permits encoders to break the image into multiple IDAT chunks for streaming output and also permits multiple tEXt chunks.

The four-byte chunk type should consist entirely of ASCII letters. The case of each letter is significant! The significance of case is as follows:

Critical/Ancillary: First Letter

Chunks which are not strictly necessary in order to meaningfully display the contents of the file are known as "ancillary" chunks, and their names must begin with a lowercase letter. Decoders encountering an unknown chunk beginning with a lowercase letter may safely ignore it and display the image. The offset chunk (offs) is an example of an ancillary chunk.

Chunks which are critical to the successful display of the file's contents begin with an uppercase letter. Decoders encountering an unknown chunk beginning with an uppercase letter must indicate to the user that the image contains information it cannot safely interpret and refuse to display the contents of the file. The image header chunk (HEAD) is an example of a critical chunk.

If a chunk is critical, its name begins with an uppercase letter. If a chunk is ancillary its name begins with a lowercase letter.

Public/Private: Second Letter

If the chunk is public (part of this specification or a later edition of this specification), its second letter is uppercase. If your application requires proprietary chunks, and you have no interest in seeing the software of other vendors recognize them, use a lowercase second letter in the chunk name. Such names will never be assigned in the official specification.

RESERVED (always uppercase): Third Letter

The significance of the case of the third letter of the chunk name is reserved for future expansion; at the present time all chunk names will have uppercase third letters.

Don't-Copy/Copy: Fourth Letter

When a PNG stream is read, modified and written out again, certain ancillary chunks may need to be changed to reflect changes in other chunks. For example, a histogram chunk needs to be changed if the image data changes. If the encoder does not recognize histogram chunks, copying them blindly to a new output stream is incorrect; it is preferable to drop the chunk.

If the fourth letter of a chunk name is lowercase, and the encoder does not recognize the chunk, it may be copied regardless of other changes. If the fourth letter of a chunk name is capitalized, and the encoder does not recognize the chunk, and any changes have been made to critical chunks, the chunk should not not be copied to the output PNG stream. (Of course, if the encoder does recognize the chunk, it may output an appropriately modified version.)

Note that encoders which do not recognize a critical chunk should indicate an error and refuse to process that PNG stream at all. The copy/don't copy mechanism is intended for use with ancillary chunks.

The fourth letter is always capitalized for critical chunks.

Registering Proprietary Chunks

If you want others outside your organization to understand a chunk type that you invent, CONTACT THE AUTHOR OF THE PNG SPECIFICATION (boutell@netcom.com) and specify the format of the chunk's data and your preferred chunk type. The author will assign a permanent, unique chunk type. The chunk type will be publicly listed in an appendix of extended chunk types which can be optionally implemented. Note that the creation of new critical chunk types is discouraged unless absolutely necessary. This process will begin as soon as the basic specification is finalized. In the event that Mr. Boutell is unable to maintain the specification, the task will be passed on to a qualified volunteer or organization.

If you do not require or desire that others outside your organization understand the chunk type, you may use a private chunk name by specifying a lowercase letter for the second character (see above).

Please note that if you want to use these proprietary chunks for information that is not essential to view the image, and have any desire whatsoever that others not using your internal viewer software be able to view the image, you should use an ancillary chunk type (first character is lowercase) rather than a critical chunk type (first character uppercase).

Also note that others may use the same proprietary chunk names, so it is advantageous to keep additional identifying information at the beginning of the chunk.

Chunk Ordering

Rules regarding chunk order are stated in the description of each chunk.

Standard Chunks

All implementations must understand and successfully render the critical chunks below.

Standalone image viewers ideally should also be capable of displaying the ancillary chunks below, such as tEXt, but this is not necessary for applications in which many images may be displayed at once (ie, WWW browsers).

Chunk Type    Description               

HEAD          Bitmapped image header

              This chunk must appear FIRST if the file contains
              a bitmapped image.

              Width:            4 bytes
              Height:           4 bytes
              Bit depth:        1 byte
              Color type:       1 byte 
              Compression type: 1 byte
              Filter type:      1 byte
              Interlace type:   1 byte

              Width and height are 4-byte integers. Zero
              is an invalid value. The maximum for both
              is (2^31)-1 in order to accommodate languages
              which have difficulty with unsigned 4-byte values.

              Bit depth is a single-byte integer. Valid values
              that software must support are 1, 2, 4, 8, and 16.
              (Note that bit depths of 16 are easily supported on
              8-bit display hardware by dropping the least
              significant byte.)

              Color type is a single-byte integer. Valid values
              are 0, 2, 3, 4, and 6. Color type determines the
              interpretation of the image data.

              The legal color type values represent certain 
              sums of the following values: 1 (palette used),
              2 (color used), and 4 (full alpha used). The bit
              depth restrictions for each type are present both to 
              simplify implementation and to prohibit certain combinations
              which do not compress well in practice. 
               
              Note that full alpha channel, where present, is always
              represented by one byte per pixel, even when the bit
              depth is 16.

              Color Type  Valid Bit Depths  Interpretation
              0           1,2,4,8,16        Each pixel value is a grayscale 
                                            level, where the largest value is 
                                            white and zero is black.

              2           8,16              Each pixel value is a three-value
                                            series: red (0 = black, max = red),
                                            green (0 = black, max = green),
                                            blue (0 = black, max = blue).

              3           1,2,4,8           Each pixel value is a palette 
                                            index; a PLTE chunk will appear.

              4           8,16              Each pixel value is a grayscale 
                                            level, where the largest value is 
                                            white and zero is black, followed
                                            by an alpha channel byte. Alpha
                                            channel is a single byte, even
                                            when the pixel value is two-byte.

              6           8,16              Each pixel value is a four-value
                                            series: red (0 = black, max = red),
                                            green (0 = black, max = green),
                                            and blue (0 = black, max = blue),
                                            followed by an alpha channel
                                            byte (0 = transparent, 
                                            255 = opaque). Alpha channel
                                            is always a single byte, even
                                            when the RGB levels are two-byte.

              Compression Type

              Compression type indicates the compression scheme
              which will be used to compress the image data.

              This draft proposes use of the inflate/deflate compression 
              scheme, an LZ77 derivative which is used in zip, gzip, pkzip 
              and related programs, because extensive research has been done
              supporting its legality. Inflate and deflate code
              is available in the zip/unzip packages with a very
              permissive license (yes, permissive enough for
              commercial purposes, see those packages for details).

              At present, only compression type 0 (inflate/deflate 
              compression with a 32K sliding window) is defined. At present, 
              all standard PNG images must be compressed with this scheme.

              Filter Type

              Several filters are defined by the PNG specification. The
              purpose of these filters is to prepare the image data for
              optimum compression. 

              Filters are applied to bytes, not to 
              pixels, even if the bit depth is larger than a byte.

              All PNG filters are strictly lossless. Decoders must 
              understand filters in order to display the image.

              Filters defined in PNG:
              Byte   Name 
              0      None
              1      Cross
              2      Sub

              Filter Recommendations:

              Filter 0 (none) should be used for color type 3
              (palette-based color).

              Filter 0 (none) should be used on images of bit depths
              other than 8 or 16. 

              Filter type 0 (none) should be used if it is known
              that the image was converted from an 8-bit, palette-based 
              source, such as a GIF image converted to a JPEG.
           
              Filter type 1 (cross) should be used for other noninterlaced
              images.

              Filter type 2 (sub) should be used for other interlaced images.

              See section 5 (details of specific algorithms) for
              exact definitions of the cross and sub filters. 
                
              Interlace Type

              At present, there are two legal values for
              interlace type: 0 (no interlace) or 1
              (line-wise interlace).

              With interlace type 0, rows are laid out
              continuously from top to bottom.

              With interlace type 1, rows are stored in the 
              following order:     

              Every eighth row, starting from row 0
              Every eighth row, starting from row 4
              Every fourth row, starting from row 2
              Every second row, starting from row 1                   

              The purpose of this feature is to allow images
              to "fade in" in a simple fashion that does
              minimal damage to compression efficiency,
              although the file size is slightly expanded
              on average. 

              Other interlace types have been proposed, and will
              replace this scheme in the final proposal if the gain 
              in visual quality is sufficient to outweigh any compression 
              penalties.

gAMA          Gamma Correction

              Image gamma value: 2 bytes

              The image gamma chunk specifies the gamma of the camera
              (or simulated camera) that produced the image, and thus
              the gamma of the image with respect to the original scene.

              A value of 10000 represents a gamma of 1.0, a value
              of 4500 a gamma of 0.45, and so on (divide by 10000.0).

              Note that this is *not* the same as the gamma of the display
              device that will reproduced the image correctly.  To get
              accurate tone reproduction, the gamma of the display device
              and the gamma of the image file should be reciprocals of
              each other, since the overall gamma of the system is the
              product of the gammas of each component.  So, for example,
              if an image with a gamma of 0.4 is displayed on a CRT with
              a gamma of 2.5, the overall gamma of the system is 1.0.
              An overall gamma of 1.0 gives correct tone reproduction.

              If the encoder does not know the gamma value, it should not
              write a gamma chunk; the absence of a gamma chunk
              indicates the gamma is unknown. If the gamma chunk
              does appear, it must precede the PLTE chunk.

              If it is possible for the encoder to determine the gamma,
              or to make a strong guess based on the hardware on which it
              runs, then the encoder is strongly encouraged to output
              the "gAMA" chunk.

              It is common for images to have a gamma of less than 1 for
              three reasons:

                1) A gamma of around 0.4 enables an image to
                   be directly displayed on a frame buffer driving a CRT
                   without the need for a lookup table (either hardware or
                   software) to correct for the response of the CRT -
                   such images are said to be "gamma corrected".  Most
                   GIF and JFIF images seen on Usenet are gamma corrected
                   (regardless of what the JFIF standard says).

                2) "Gamma correction" is a standard part of all video signals.
                   It makes receiver design easier, but it also reduces noise
                   in the transmission of video signals, both analog and
                   digital.  Video cameras have a gamma of 0.45 (NTSC) or 0.36
                   (PAL/SECAM), so images obtained by frame-grabbing video
                   already have this value of gamma.

                3) This non-linear transformation allocates more of the
                   available pixel codes or voltage range to darker areas
                   of the image.  This allows photographic-quality images
                   to be stored in only 24 bits/pixel without banding
                   artifacts in the darker areas (in most cases).
                   This makes "gamma encoding" a much better way of
                   storing computed images than the more common linear
                   encoding.

              In the case of computer graphics images, if we assume that
              the black-to-white brightness range is represented as
              floating-point values in the range 0 to 1, then gamma encoding
              is performed by:

                gbright = bright ^ gamma
                pixelval = ROUND(gbright * MAXPIXVAL)

              Computer graphics renderers often do not perform this gamma
              encoding, making pixel values directly proportional to
              scene brightness.  This "linear" pixel encoding is equivalent
              to gamma encoding with a gamma of 1.0, so graphics programs
              that produce linear pixels should always put out a "gAMA"
              chunk specifying a gamma of 1.0.

              To produce correct tone reproduction, a good image display
              program must take into account both the gamma of the image
              file and of the display device.  This can be done by
              calculating

                gbright = pixelval / MAXPIXVAL
                bright = gbright ^ (1.0 / file_gamma)
                gcvideo = bright ^ (1.0 / display_gamma)
                fbval = ROUND(video * MAXFBVAL)

              where MAXPIXVAL is the maximum pixel value in the file (255
              for 8-bit, 65535 for 16-bit, etc), MAXFBVAL is the maximum
              value of a frame buffer pixel (255 for 8-bit, 31 for 5-bit,
              etc), pixelval is the value of that pixel in the PNG file,
              and fbval is the value to write into the frame buffer.
              The first line converts from pixel code into a normalized
              0 to 1 floating point value, the second "undoes" the encoding
              of the image file, the third line does "gamma correction"
              for the monitor, and the fourth converts to an integer frame
              buffer pixel.

              (Note that this assumes that you want the final image to
              have a gamma of 1.0 relative to the original scene.
              Sometimes it looks better to make the overall gamma a bit
              higher, perhaps 1.25.  To get this, replace the first "1.0" in
              the formula above with "desired_system_gamma".)

              Note that it is not difficult to calculate a gamma
              conversion table; it is *not* necessary to
              perform transcendental math for every pixel!

              In practice, it is often difficult to determine
              the gamma of the actual display. It is common to
              assume a display_gamma of 2.2 (or 1.0, on hardware for
              which this value is common) and allow the user to
              modify this value at their option.

              Finally, note that the response of the display is actually
              more complex than can be described by a single number
              (display_gamma).  If actual measurements of the monitor's
              light output as a function of voltage input are available,
              the third line and fourth lines of the computation above
              should be replaced by a lookup in these measurements, to find
              the actual frame buffer value that most nearly gives the
              desired brightness.

              Although viewers are strongly encouraged to
              implement gamma correction, in some cases speed
              may be a concern. In these cases, viewers are
              encouraged to have precomputed gamma correction tables for
              file_gamma values of 1.0 and 0.45 and some reasonable
              single display_gamma value, and to use the table
              closest to the gamma indicated in the file.

PLTE          Palette

              This chunk must appear for color type 3, and
              may appear for color types 2 and 6. If this chunk
              does appear, it must precede the first IDAT chunk.

              In the case of color types 2 and 6, the PLTE chunk is 
              optional, and provides a recommended set of from 1 to 256 
              colors to which the true-color image should be quantized if 
              the display hardware cannot display truecolor directly. 
              If it is not present, the viewer must select colors on its own,
              but it is most efficient for this to be done once by
              the encoder. 

              The number of palette entries varies from 1 to 256.
              For palette type 3, the number of entries should not
              exceed the range that can be represented by the
              bit depth (for example, 2^4 = 16 for a bit depth of 4).
              Note that this does NOT mean that there have to
              be a full 16 entries. The length of the chunk is used
              to determine the number of entries.

              Each palette entry consists of a 
              three-byte series:

                     red (0 = black, 255 = red),
                     green (0 = black, 255 = green),
                     blue (0 = black, 255 = blue)

              Note that the palette uses 8 bits (1 byte) per value 
              regardless of the image bit depth specification.
              In particular, the palette is 8 bits deep even when it is 
              a suggested quantization of a 16-bit truecolor image.

tRNS          Transparency. Transparency is an alternative to the
              full truecolor alpha channel.
              
              For color type 3:
              A series of alpha channel bytes, corresponding to
              palette indexes in the PLTE chunk.

              The transparency chunk may contain fewer
              alpha channel bytes than there are palette
              entries. In this case, the alpha channel value
              for all remaining palette entries is assumed
              to be 255 (fully opaque, no background visible).
              0 is full transparency (only background visible).

              Decoders which cannot blend colors with the
              background should interpret all nonzero alpha
              values as fully opaque (no background).

              For color type 0:
              Transparent gray level (2 bytes, range: 0 - (2^bitdepth - 1)) 

              The specified gray level will be regarded as transparent
              (the background color at that position will be substituted 
              consistently by the decoder).

              For color type 2:
              Transparent RGB color (6 bytes, 2 bytes for
              red, green and blue components, range for each:
              0 - (2^bitdepth - 1))

              The specified RGB color will be regarded as transparent
              (the background color at that position will be substituted 
              consistently by the decoder).

              Although transparency is not as elegant as the full
              alpha channel, transparency does not adversely 
              affect the compression of the image. 

              When present, the "tRNS" chunk must precede
              the first IDAT chunk, and follow the
              PLTE chunk, if any.

bKGD          Background color. 

              When displaying the image in a
              stand-alone viewer, it is useful to specify the
              background color against which the image is
              intended to appear.

              For color type 3:
              Background index into palette 
              (1 byte, range: 0 - (size of palette-1) )

              For color types 0 and 4:
              Background gray level (2 bytes, range: 0 - (2^bitdepth - 1)) 

              For color types 2 and 6:
              Background RGB color (6 bytes, 2 bytes for
              red, green and blue components, range for each:
              0 - (2^bitdepth - 1))

              When present, the bkgd chunk must precede
              the first IDAT chunk, and follow the
              PLTE chunk, if any.

hIST          Histogram.

              When displaying a palette-color image (color type 3),
              it is often necessary to render the image using fewer
              colors than are actually present in the image.
              To produce the highest-quality result, it is helpful
              to have information on the frequency with which each
              palette index actually appears, in order to choose
              the best palette for dithering or drop the least-used colors.
              Since images are often created once and viewed many
              times, it makes sense to calculate this information
              in the encoder, although it is not mandatory.

              The hIST chunk, if it appears, must be preceded
              by the PLTE chunk, and must precede the first
              IDAT chunk. hIST only appears for color type 3.

              The histogram consists of a series of 16-bit
              unsigned values, one for each entry in the
              PLTE chunk.

              Each entry is approximately equal to the
              number of pixels with that palette index,
              multiplied by 65535, and divided by the
              total number of pixels, rounding up. 

              However, it is critical that the entry for a
              given palette index be at least one (1) if at least
              one (1) pixel uses that palette index.

              Pseudocode to determine the histogram entry for palette
              index "X", allowing for round-off errors,
              is given below:

   scaledFrequency = pixelsWithIndex(X) * (65535.0 / pixelsTotal)

   IF scaledFrequency >= 65535.0 THEN
     histogramEntryForIndex(X) = 65535
   ELSE 
     histogramEntryForIndex(X) = ceil(scaledFrequency)
   END IF 

   IF pixelsWithIndex(X) > 0 AND histogramEntryForIndex(X) = 0 THEN
     histogram_entry_for_color_X = 1
   END IF

tEXt          Text. 

              "tEXt" chunks consist of a null-terminated keyword
              containing any sequence of ISO 8859-1 (LATIN-1)
              characters (the LATIN-1 set includes the ASCII set),
              followed by the text associated with that keyword.
              Note that the text is not null-terminated
              (the length of the chunk is sufficient information
              to locate the ending).

              Any number of "tEXt" chunks may appear, and more than
              one with the same keyword is permissible.

              The following text keywords are predefined
              and should be used where possible:

              Title                 
              Author
              Copyright
              Description

              Other keywords, containing any sequence of
              printable characters in the character set, may
              be invented for other purposes.

phys          Physical pixel dimensions.
              4 bytes: pixels per unit, X axis (unsigned integer)
              4 bytes: pixels per unit, Y axis (unsigned integer)
              1 byte: unit specifier

              The following values are legal for the unit specifier:
              0: units unknown (aspect ratio only)
              1: unit is the meter

              Large units are employed to ensure sufficient
              resolution. If this ancillary chunk is not present,
              pixels are assumed to be square, and the physical
              size of each pixel is unknown. (Conversion note: one inch
              is equal to 2.54 centimeters, and therefore to .0254 meters.)

offs          Physical image offset.
              
              4 bytes: image position in microns (X axis)
              4 bytes: image position in microns (Y axis)

              The position on a printed page at which the image
              should be output when printed alone. Note that the
              origin is at the upper left corner
              of the page. 

time          Time of image creation. Greenwich Mean Time (GMT)
              should be specified, not local time.

              2 bytes: Year (complete; ie, 1995)
              1 byte: Month (1-12)
              1 byte: Day (1-31)
              1 byte: Hour (0-23)
              1 byte: Minute (0-23)
              1 byte: Second (0-59)
 
IDAT          Image data.

              The image data will be compressed using the
              compression scheme indicated by the compression
              type field of the HEAD chunk.

              IMPORTANT: the compressed image data is the concatenation
              of the contents of ALL the IDAT chunks. (If there are
              multiple IDAT chunks, they will always appear
              sequentially.) Viewers must be able to interpret images
              that contain multiple chunks.

              Simply speaking, the viewer knows it is not finished until it 
              has uncompressed and interpreted as many pixels as are indicated 
              by the image dimensions in the HEAD chunk. This rule
              exists to permit encoders to work in a fixed
              amount of memory by outputting multiple chunks.

              The following text describes the uncompressed
              data stream which will be fed to the compressor
              or received from the decompressor.

              Individual pixel values are determined in accordance
              with the rules set out in the description of each
              color type in the HEAD chunk. 

              Pixels are always laid out left to right in 
              each row, and rows are arranged from
              top to bottom, except as modified by
              the interlace type field of the HEAD chunk.

              Alpha, when present, always occupies a single byte
              for each pixel.

              For bit depths less than 8, pixels are always
              arranged from left to right in each byte.
              That is, in a grayscale image with a bit depth
              of 1, the first pixel of a line appears in 
              bit 7 (128) of the first byte.
 
              Note that consecutive lines never share a byte.
              That is, if the last pixel of a line ends on the third
              bit, the first pixel of the next line begins on the
              seventh (leftmost) bit of the next byte. 

              The resulting bytes are then passed through the appropriate
              filter, as specified by the Filter Type field
              of the HEAD chunk. The result of this operation
              is then passed to the compression algorithm,
              as specified by the Compression Type field
              of the HEAD chunk.
               
TAIL          End of PNG Image Stream

              The TAIL chunk is empty.

              The TAIL chunk must appear last in a PNG image stream,
              and also marks the end of a PNG file.

              The TAIL chunk no longer contains a cumulative CRC,
              since every chunk now contains its own CRC.

5. Details of Specific Algorithms

Inflate and Deflate

See the zip/unzip package, which includes source code for both purposes in the files inflate.c and deflate.c, with a very permissive license. Documentation of the compression scheme is also available; see the zip/unzip package for references. (zip/unzip and pkzip are compatible but not identical. pkzip is commercial software.)

A formal, detailed specification of inflate and deflate will be included in the final standard, and is being written at this time. The formal specification will be compatible with the format defined by the inflate.c/deflate.c code.

The Sub Filter

The sub filter is used to improve compression on interlaced truecolor images (color types 2 and 6) and interlaced 8- and 16-bit grayscale images (color types 0 and 4).

Apply the following formula to each pixel of each line of raw data, where x ranges from zero to the total number of bytes representing that line, minus one (1):

Sub(x) = Raw(x) - Raw(x-bpp)

for each byte, regardless of bit depth. bpp is defined as the number of bytes per complete pixel, rounding up to one (1). For instance, for color type 2 with a bit depth of 16, bpp is equal to 6 (three channels, two bytes per channel); for color type 0 with a bit depth of 2, bpp is equal to 1 (rounding up); for color type 4 with a bit depth of 16, bpp is equal to 3 (two-byte grayscale value, plus one-byte alpha channel).

Important: for all x < 0, Raw(x) = 0.

The Cross Filter

The cross filter is used to improve compression on non-interlaced truecolor images (color types 2 and 6) and 8- and 16-bit grayscale images (color types 0 and 4). Cross is similar to sub, but takes the previous line into account; this is highly effective as long as the image is not interlaced.

Output the following value, using unsigned modulo arithmetic, for each byte of the raw data, where x ranges from 0 to the total number of bytes representing that line, minus one (1) and y ranges from 0 to the total number of rows in the image, minus one (1):

Raw(x)(y) - Raw(x-bpp)(y) - Raw(x)(y-1) + Raw(x-bpp)(y-1)

Important: For all x < 0 and all y < 0, Raw(x)(y) = 0.

To reverse the effect of the cross filter after decompression, output the following value:

CrossedValue + Pixel[x-1][y] + Pixel[x][y-1] - Pixel[x-1][y-1]

storing the result as the value of the previous pixel for use in uncrossing subsequent pixels.

The Alpha Channel

Standalone image viewers can ignore the alpha channel, provided that they properly skip over it in order to be in the right position to read the next pixel. However, if the background color has been set with the ABGD chunk, the alpha channel can be meaningfully interpreted with respect to it even in a standalone image viewer.

World Wide Web browsers and the like should regard any pixel with an alpha channel value of zero as transparent (the pixel should be given the background color of the browser), and any pixel with the maximum alpha channel value for that bit depth as opaque (not blending with the background at all).

Viewers which are not in a position to smoothly combine foreground and background colors should regard any nonzero alpha channel value as fully opaque (fully foreground color).

For applications that do not require a full alpha channel, or cannot afford the price in compression efficiency, the ATNS transparency chunk is also available.

CRC Calculation

The CRC polynomial employed is as follows:

x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x+1

CRC computation is not difficult, nor as computationally intensive as the above may suggest. Pseudocode will appear in the next draft of this document.

6. Finalization Schedule

It is anticipated that a final draft, barring necessary revisions stemming from implementation, will be available by the end of February, 1995. It is anticipated that a reference implementation will be available by the end of March, after which all changes made to the standard chunk set will be backward-compatible. The reference implementation will be freely usable in all applications, including commercial applications.

7. Pronunciation

PNG is pronounced "ping."

End of PNG Specification

Thomas Boutell's home page